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Various statistical models and techniques were employed to 
forecast the existence of low-level stratus conditions. They 
are illustrated for data at a single station (Moffett field, 
Sunnyvale, California) using s i ng le -s ta t i on surface meteorolog- 
ical measurements only as explanatory variables. A preliminary 
exploratory data analysis shows that low (high) dew point de- 
pression is associated with the existence (non-existence) ot 
low-level stratus at Moffett Field. Procedures for and results 
of various methods of fitting logistic models to the data are 
described. The fitted models were used to forecast stratus on 
reserved data sets (cross-validation ) . Results of the cross- 
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LOW-LEVEL STRATUS PREDICTION USING BINARY STATISTICAL 



REGRESSION: A PROGRESS REPORT, USING MOEERTT FIELD DATA 

Donald P. Gaver Patricia A. Jacobs 

Operations Research Department 
Naval Postgraduate School 

U . Executive Summary 

In this paper various statistical models and techniques 
are employed to forecast the existence ot low-level stratus 
conditions. They are illustrated for data at an airport 
(Moffett Field, Sunnyvale, California). 

In Section 2 the data set is described and the results 
of a preliminary exploratory data analysis are given. These 
suggest that dew point depression should be predictive of the 
existence of stratus. Generally, low (high) dew point depres- 
sion is associated with the existence (non-existence) of stra- 
tus. This association is also made evident by a spectral 
analysis of hourly stratus levels and dewpoint depression 
described in Appendix F. 

The remainder of this paper describes procedures tor and 
results ot, fitting logistic models to the data described. 
Validation of the models are addressed as well. The basic 
logistic model is 

exp{x_ Bj 

P{Y = 1 | explanatory variable x} = i ' + e "> T p{ x" "~ B J 

where x is a p-vector (row) of explanatory variables and J3 
is a p-vector (column) of coefficients to be determined. 
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Appendix B suggests several mathematical justifications tor 
use of the logistic regression model. 

We have used various methods to tit logistic models tor 
use as predictors on reserved data sets (cross-validation). 

Our cross-validation experiences are reported in Appendices A 
through D. Appendix G contains the asymptotic distribution ot 
a threat score, which is one ot the statistics we use to compare 
procedures . 

Appendix A reports on use ot the stepwise logistic regres- 
sion procedure of the BMDP computer package. The procedure 
chooses variables to be used in the regression from a menu ot 
variables given to it. The BMDP tits are then used to predict 
the occurrence ot stratus tor independent data, i.e. trom dif- 
ferent years. we find that the stepwise feature must be used 
with caution; it tends to overtit, including variables which 
greatly increase the standard error ot the variables first 
included in the regression. bucli overt ittin y degrades the 
predictive powers ot the model. 

Copas ( iy« J ) points out that a regression model, tit by 
maximum likelihood (or least squares) to one set ot data, and 
then used tor prediction on another set ot data, nearly always 
tits or predicts the new set ot data Less well than it does the 
original set. This phenomenon ot shrinkage can become more pro- 
nounced it the original regression model is tit using a step- 
wise procedure, which tends to overt it. Appendix b describes 
and investigates a procedure suggested by Copas to compensate 
tor shrinkage on both regressions tit with, and without, stepwise 
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procedures. In our application, particularly when predicting 
changes from stratus to no stratus, the shrinkage procedure 
appears to help. However, it appears to do less well in predict- 
ing changes from no stratus to stratus. 

In Appendix C robust estimation procedures for logistic 
regression are described and carried out on the Moffett field 
data. These procedures are less vulnerable than maximum like- 
lihood estimates to a few outlying data points which may not 
agree with the model. For the particular cross validations 
performed, predictions using models fit with robust procedures 
were no better than predictions made with the models t if with 
maximum likelihood. The models obtained are, however, system- 
atically different from their classical counterparts. 

In Appendix D we investigate the predictive use of lo- 
gistic regression models that are progressively updated to em- 
phasize recent data. The suggestion is that models fit with 
data which are closer in time to the dates on which forecasts 
are to be made may be more relevant, owing to changing condi- 
tions not represented in the model, than a model which is fit 
with data of several previous years. We found that models 
with updating often did at least as well as models without an 
updating feature. 

In summary, we have found that £n (dew point depression + 1) 
appears to be a consistently useful predictor of the occurrence 
of stratus. Low (high) dew point depression is associated with 
stratus (no stratus). There is no one procedure or model, among 
those tried to date, that appears a clear winner. It, for 
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example, one procedure does well in predicting changes t rout no 
stratus to stratus, it will otten do less well in predicting 
changes t rom stratus to no stratus. We tound that none of the 
procedures did as well predicting the occurrence ot stratus in 
196^ as it did in 1961. This suggests tnat perhaps 196/ is not 
described by the present models as well as is 1961, oeing in- 
trinsically quite different from the previous years 1956-61. 
Models and methods that represent year-to-year differences will 
come under investigation in future. 

Furtner work, with other models, and with data from other 
locations, will oe undertaken to shed light on this important 
prediction problem. 
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Introduction and Overview 



1 . 

The purpose of this paper is to exhibit the use of sta- 
tistical tools and procedures tor forecasting the existence of 
low-level stratus conditions at an airport. The existence of 
low stratus (less than or equal to 1UUU ft.) forces the use of 
different methods of traffic handling than is the case when 
higher stratus levels prevail. A low stratus condition tends 
to inhibit flight operations, so it is desirable to torecast 
its occurrence. Furthermore, it is of interest to torecast 
such conditions on a "single-station" Dasis, making use ot me- 
teorological measurements available only at the location — e.g. 
airport — in question, in case useful supplementary information 
is unavailable. 

The forecasting approaches described here are statisti- 
cal in nature, meaning that extensive data concerning the re- 
ported hourly stratus level at an airport (Moffett field, 
Sunnyvale, CA ) , together with certain other meteorological 
measurements or parameters recorded and reported at that loca- 
tion, were used as raw material tor the forecasts. These data 
were used to estimate the probability of low stratus during a 
daily period; the latter probability was estimated using a 
logistic regression model , a tool that has been found useful in 
biological and medical statistics, and that has been previously 
applied in meteorology; cf Brelsford and Jones (iy67), 

Gilhausen (I97y), Gabriel and Pun (197y). In a later section 
we present various derivations or justifications of such a 
model. Alternative models are also suggested, and the 
usefulness of these will be investigated in future work. 
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The usefulness of the logistic (or any other) model must 
be judged by its performance. We have chosen to proceed by (i) 
fitting a model to data for certain specific years (1958-1960), 
and then (ii) comparing the model predictions to actual occur- 
rences tor a completely different period (1961, 1962). Such a 

proceaure is termed cross validation ; see Hosteller and Tukey 
(1977) for good general discussion and references. The results 
of our cross validation are reported subsequently. Another in- 
teresting and possibly useful approach is to construct and test 
an adaptive, automat ica 1 ly up-dating forecasting model witn 
characteristics similar to "exponential smoothing" or "Kalman 
filtering." Results of some simple updating procedures tor 
forecasting will also be reported. 

Successful forecasting with the aid of a model requires 
that the data inputs be relatively "clean," or in basic con- 
formity with the model. Occasionally occurring data points 
that are out of line for any reason, called outliers , or 
influential values , can radically change the values of the 
model parameters obtained from statistical fitting principles 
such as least squares (not used tor fitting our logistic model) 
or maximum likelihood (wnich is used). To check tor such 
maverick, possibly detrimentally influential, values it is 
possible to proceed in several ways. One is to successively 
remove each data point (actually a vector of response and 
explanatory variables) and re-tit the model, watching for 
radical changes in fitted model parameters. This method has 
been programmed (in APL, on the NPS IBM 3033 system) and 
exercised; its defect is that at present just one data point 
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is removed at a time, so if several points are mavericks this 
fact may be overlooked. Clever ways of au toma t i ca 1 ly diminish- 
ing the eftects of maverick points have been discussed by 
Preyibon (1982); exploration of the applicability of such ideas 
to the present stratus prediction problem is currently underway. 
The methods and some results are reported here. 

Another approach to the identification of maverick data, 
and to the possible discovery of an appropriate model, is by 
computer graphics. We have initiated the examination of tne 
low-stratus data on a pioneering graphics facility at Stanford 
Linear Accelerator (SLaC); see an article in Science , Kolata 
(1982), tor general description. The SLAC system allows an 
analyst views of various three-dimensional space projections of 
multidimensional data-clouds. Such examination helps to reveal 
the association between certain explanatory ("independent") 
variables and the response ("dependent variable") of interest. 
For example, examination of our stratus data indicated that 
changes in the explanatory variable dewpoint depression tended 
to be reflected in changes of response, i.e. low stratus level 
probability. This association has physical basis, and dewpoint 
depression had actually been included in earlier exploratory 
logistic fits at the suggestion of W. Sweet of NFPKF; its 
incorporation into the model considerably improves predictive 
performance . 
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The Basic Data bet 



2 . 

The statistical methods used in this study were applied 
to data furnished by to. Sweet of NEFKF, to whom we are grateful. 
In summary, these data consist of reported hourly determinations 
of : 

(i) stratus level , reported to be at discrete levels of 
1UU ft. separation; possible recorded levels are 
k x 1UU ft., k = 1 , 2 y , 1U 999" (no visible 
stratus ) . 

(li) east-west wind velocity, V , at surface, in miles 
per hour; 

(iii) north-south wind velocity, V , at surtace, in miles 
per hour; ^ 

(iv) temperature, at surtace, in degrees F; 

(v) dewpoint, at surtace, in oeyrees F; 
all at Moffett Field, California, for the months of July, 

August, and September of the years iysy-iybZ; later data are 
also available, and remain to be analyzed. Although other 
measurements, e.y. of pressure, are in principle available, they 
were not utilized in the present analysis. Nor were measure- 
ments from neighboring locations in the San Francisco bay area. 
2.1. The Forecasting Exercise Data Set 

The raw data described above were adopted to the fore- 
casting exercises as follows: 

(a) Forecasts are made of the existence of stratus 
level less than 1U0U ft . (<_ 9UU ft.) on any hour between 
10:U0 pm (22U0) on day t , and 6:00 am (U6U0) on day 
t + 1 . If hourly-reported stratus level ever fell to a 
level £ y(J0 ft. during such a period beginning on day t, 
it is agreed to say that stratus existed on day t; otherwise 



that no stratus existed on day t. Denote by the binary 
indicator variable y the existence (non-existence) of 
stratus on day t according to the above definition. Thus 



y 



t 



1 if stratus exists on day t, 

0 if no stratus exists on day t. 



Call y the response (or dependent variable) when forecasting 
for day t. Note that the observed values of response on pre- 
vious days ( y • Y (--2 ' * • • ) are available as assistance when 
forecasting for day t. The above definition of meaningful 
stratus agrees with instrument/no instrument landing rules at 
airports, and is thus of operational significance. 

Candidate explanatory (independent) variables are these: 

(b) wind velocities at 6:00 pm (1800) on day t , 
items (ii) and (iii) above; 

(c) temperature (T ) and dewpoint ( ) at surface at 
6:00 pm on day t ; 

(d) dewpoint depression, A = T fc - D^ at 6:00 pm 
on day t; 

(e) hours of stratus (H^__^) between 2200 on the previous 
day t-1 and 0600 on the current day t ; 

(f) existence/non-existence of stratus ( y > y t _ 2 • • • • ) 

on previous days. 

Let NS denote the number of consecutive days of stratus in 
a run of stratus days that includes day t-1, the day on which 
the prediction is made. NNS fc is the number of consecutive days 
of no-stratus in a run of no-stratus days that includes day 



t-1. 



0 



Note that because of the way in which the response 
is defined, it is legitimate and of interest to forecast 
terms of T , D , A, , V (t), etc. These latter quantities 
all available at 6:00 pm for forecasts applying later, i.e 
10:00 pm to 6:00 am on the following day. Of course many 
functions of the hourly observations are candidates for 
explanatory variable status. 



y t 

y t in 

a re 

. f rom 
other 
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3 . Preliminary Analysis 



Before proceeding to the fitting of specific models, a 
subset of the data has been examined in terms of simple summa- 
ries. Since the objective is to forecast, we have divided (con- 
ditioned) the data for the years 1958, 1959, 1960 into four 

groups : 

Group 00: observations such that y fc _^ = 0, y fc = 0 , 

Group 01: observations such that ^f-l = ^t = ^ ' 

Group 10: observations such that y . = 1, y = 0 , 

Group 11: observations such that y ^ ^ = 1, y^ = 1 , 



and have then computed summaries of the observed distributions 
of certain candidate explanatory variables. The argument is 
that a noticeable separation of such distributions when predict- 
ing y t from the particular explanatory data suggests that the 
variable in question may be useful in forecasting. 

Note that we have explicitly used the known stratus state 
of the system at t-1 as one important variable, wishing to 

make full use of persistence, and to improve upon it. We are 

especially interested in the power of explanatory variables and 
their combinations to correctly forecast cha nges in stratus 
conditions, e.g. from y ^ = 0 (no stratus on day t-1) to 

y = 1 (stratus on day t). Simple persistence forecasting, 

which predicts y ^ = y ^ ^ will never identify pros pec t i ve 

changes. 

Computer graphic analysis carried out at SLAC, plus 
physical insight, suggest that dewpoint depression, A^, should 
be an effective explanatory variable. Another useful variable 



seems to be the hours of stratus observed on day t-1, denoted 
by H t: 1 * There are limitless other plausible explanatory 

variables, as well as combinations and re-expressions (trans- 
formations) of the latter, but here we look at only two. One 
systematic way of uncovering predictive combinations of explana- 
tory variables is by use of some form of principle component or 
factor analysis; such work is not reported here. If seems pos- 
sible that a robust principle component analysis may he informa- 
tive (see Gnanadesikan (19" 7 " 7 ), or Campbell (1982)), for the 
existence of groups of maverick-like data have been reported in 
the overall data base. Clustering procedures may also be of 
value. 

Tables 1 and 2 give a few useful summaries of the behavior 
of the candidate explanatory variables A and these 

have been developed for the years 1958, 1959, I960. The figures 

in parentheses are natural logarithms of their counterparts. 

The log transformat ion is suggested to symmetrize the sample 
distribution (histogram or Tukey stem-leaf plot), which often 
tends to appear positively skewed for the above data. The 
medians and quartiles are used instead of the ordinary means and 
standard deviations because of the possible non-robus t/res is t a n t 
properties of the latter traditional inoasures. 

We can draw the following conclusions from Table 1: 

(a) corresponding summary figures for dewpoint depression 
(0,M,0) are rather stable from year to year. 

(h) dewpoinc depression (or its log) should have prognostic 
power: roughly speaking. 
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TABLE 1 



Year 



19 58 ; 



1959; 



1960 ; 



Observed 


Distribution of Dew 


Point Depression 


<v 








Lower Ouartile 


Median Upper 


Ouartile 








(0) 


(M) 


(0) 


1 




0: 


9(2.2) 


9(2.2) 


10(2.3) 


1 


+ 


1 : 


6(1. 7 9) 


7 ( 1 . 9 5 ) 


9(2.2) 


0 




1 : 


6 ( 1 . 7 9 ) 


8(2.08) 


8(2.08) 


0 




0: 


10(2.3) 


13(2.56) 


1 7 ( 2 . 8 3 ) 


1 


-► 


0 : 


8(2.08) 


9(2.2) 


11(2.4) 


1 




1 : 


6(1. 7 9) 


7 ( 1 .95) 


9(2.2) 


0 


->• 


1 : 


1 .95) 


9(2.2) 


10(2.3) 


0 


->• 


0: 


10(2.3) 


14(2.64) 


1 6 ( 2 . 77 ) 


1 




0: 


7 ( 1.95) 


10(2.3) 


13(2.56) 


1 


+ 


1 : 


6(1. 7 9) 


8(2.08) 


9(2.2) 


0 


->• 


1 : 


7 ( 1 . 9 5 ) 


8(2.08) 


li ( 2.4 ) 


0 




0: 


10(2.3) 


13(2.56) 


1 6 ( 2 . 7 7 ) 
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(b-1) if stratus is present at time (day) t-1, and it 



A is relatively high (9 or above), a change to 
no stratus is indicated, while if is rela- 

tively low (below 9) the stratus condition tends 
to continue ; on the other hand 
(b-2) if no stratus is present at time (day) t-1, and 
if A is relatively high (10 or above) the no 
stratus condition tends to continue , while if A 
is relatively low (below 10) changes to a stratus 
cond i t ion become more frequent. 

These results are physically plausible, and appear con- 
sistently, if not overwhelmingly strongly, in the present data. 

Figures 1 and 2 show box plots of dew point depression 
and £n(dew point depression + 1) for the years 19b8-60 (cf. Tukey 
andMosteller (19" 7 " 7 )). Each of the four plots in the figures 
contain only those points for which y ^ ^ = i -*• j = y^ for 
i,j = 0,1 . The top (bottom) edge of the box is the upper 

(lower) quartile of the data set; the symbol within the box is 
at the median; the lines connect the mean; and the circles out- 
side the boxes represent outlying data points. 

It appears from the top two plots in each figure that dew 
point depression, A , may have more prognostic value if there 
is no stratus the day before. If there is no stratus the day 
before, then high A^ appears to be associated with persistence 
of no stratus. Since the box plots do overlap, it is clear that 
A ^ will not provide perfect prediction. 
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FIGURE 



BOX PLOTS FOR LN(TEMP-DEW) 

BOX PLOTS FOR LN(TEMP-DEW) FOR 0-*0 BOX PLOTS FOR LN(TEMP-DEW) FOR 0-»1 




(M30-dW3l) NT 



00 




CN 




{M3Q-dW3 1 ) N1 



00 



> 




(M3Q -dW3l) Nil 



% 



16 



FIGURE 



An exploratory spectral analysis of hourly £n(stratus 
height) and £n(dew point depression + 1) for 1958 described in 
Appendix F also suggests that high (low) dew point depression 
is associated with high (low) stratus height. 

In Table 2 are corresponding figures for hours of: stratus 
on previous days. 



TABLE 2 

Observed Distribution of Previous Days* Hours of Stratus 



Yea r 






Lower Quart ile 
(0) 


Median 

(M) 


Upper Quart i le 
(0) 


1958 ; 


1 


+ 0: 


2(0. -7) 


4(1.4) 


8(2.1) 




1 


+ 1 : 


b(1.8) 


7 ( 1 . 9 ) 


8(2.1) 


1959 ; 


1 


♦ 0: 


3(1.1) 


4(1.4) 


4(1.4) 




1 


+ 1: 


4(1.4) 


6(1.8) 


8(2.1) 


I960 ; 


1 


-*■ 0: 


3. 0(1.1) 


4(1.4) 


4(1.4) 




1 


-*■ 1: 


4(1.4) 


6(1.8) 


8(2.1) 


Aga i n 


the 


f igures 


in parentheses are 


logs . 





Again some indications from the table are of interest: 

(a) corresponding summary figures are rather stable, but 
somewhat less so than for A , 

(b) relatively low values of previous days' hours of stratus 
tend to be associated with change to no-stratus condi- 
tion, but the tendency is rather weak. 

The tendency noticed above may possibly be accounted for by the 

fact that an underlying weather system is passing over the 

Moffett area. Towards the end of its sojourn there the hours 

of resulting stratus tend to gradually decrease to zero. 



box plots for the number of hours of stratus the day 
before when there is stratus, tor years 1958-60 appear in Figure 
3. bach figure contains only those points for which the current 
day has no stratus or stratus respectively. There appears to be 
an association between a high number of hours of stratus the day 
before and persistence of stratus. The association does not 
appear strong, however. 

Although the above sort of analysis is interesting, it 
tails to incorporate the joint — possibly interactive — effects 
of several variables. Note that no such analysis is reported 
here tor the other possible explanatory variables related to 
surface wind, namely and . Somewhat surprisingly, 

these have been round to have secondary value 
and years under investigation. 



tor the location 
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FIGURE 



APPENDIX A 



Logistic Fitting ana Cross-Validation using the BMDP Packaye 

In Appendix E we give several mathematical just i t icat ions 
tor use ot the logistic regression model. In the present Appen- 
dix results are given ot titting various logistic models to 
available Mottett field data tor years lybb-bO; they are cross- 
validated tor years 19bl and iyb2. 

Here the term mode 1 reters to the basic logistic 
representation 



where x is a p-vector (row) ot explanatory variables, and j5_ 
is a p-vector (column) of coefficients to be determined. The 
BMDP package performs the fitting, i.e. determination ot B_ 
from observations, by maximum likelihood or a closely related 
method. It also furnishes Student t-values tor assessing the 
statistical significance ot the coefficients determined, and 
has a step-wise facility, which enters explanatory variables in 
accordance with their judged explanatory value. The above pro- 
cedure assumes that the model is appropriate tor the data, a 
practice that may be dangerous in observational studies, as has 
been pointed out by Pregibon (lybi), who suggests some remedies. 
An examination ot remedies tor dealing with possibly "ill- 
fitting" data by the logistic is currently in progress, and will 
be applied to the Mottett field, and other, data. 

In the exercises reported, we have fitted iyf>b-iyb(J data 
by logistic models using the variable selection feature. Two 




( A — 1 ) 



types ot tits are considered. 



In one type we condition on the 



previous day's stratus state; in other words Pq(><) means the 
probability of stratus on day t , given no stratus on t-1 and 
the influence of explanatory variables x. ; P^(xJ means the 
probability of stratus, given stratus on t-1. In the other type 
we have fit all the data at once, using an indicator variable to 
identify stratus - no stratus days. 

The predictions made are categorical: i.e. if the calcu- 

lated p-value exceeds 0.5, stratus is predicted, while if below, 
no stratus is predicted. We have cross-validated predictions 
against the years 1961 and 1962. 

Model A- 1 : Prediction, given no stratus the previous day 

(y^_^ = 0). The explanatory variables selected are: a constant, 

£n (A +1), V . The fit is as follows with standard errors of 

t y 

the fitted parameters in parentheses below: 



X 6 = 6.63 - 3.65 ln(A^) 

( 1 . 7 0 ) ( 0 . 7 4 1 ) 



0 . 08 7 8 V 



( 0.0495 ) 



where A fc =■ A^ + 1 . 

The cross validations results for I961-19b2 (F means Forecast, 
A means Actual) are below. 



1961-1962 



A \ 


F 


0 


1 


Fract ion 
Correct 


0 




88 


i 


.93 


1 




16 


6 


. 2 7 



Fraction Correct 



88 + 6 

88 + 6 + 7 + 16 



.80 
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Note that simple persistence forecasting ("tomorrow is 
the same as today") for both 1961 and 1962 gives a fraction of 
correct forecasts equal to 0.83 ( ( 88+ 7 )/( 88+ 7 +16+4 ) ) , which is 
actually slightly better than the logistic forecast. However, the 
present logistic model does correctly forecast about one-quarter 
to one-third of the changes from no stratus to stratus correctly; 
persistence will never correctly forecast a change. 

Model A-2 : Prediction given stratus the previous day (y ^ = 1). 

The variables selected were £n(A^ + 1) and , . The fit is 

t t-1 

x & = 6.12 - 3.34 £n(A ) + 0.30 H 

( 2 . 0 7 ) (0.940) (0.0893) 1 

where A = + 1 . 

The numbers in parentheses beneath the coefficients are standard 
errors based upon the assumption of a correct model, maximum- 
likelihood fitted. 

The cross validation results are below. 

1961-1962 





A \ 


F 






Fraction 




0 


i 


Correct 




0 




16 


6 


. 7 3 




1 




14 


29 


. 6 7 


Fra ct ion 


Correct 




0.6 9 = 


16 + 


29 




14 + 29 + 


16 + 6 


In this case 


the loyistic 


mode 1 


did as well as persistence (0.66) 


in predict ing 


stratus 


a nd 


no stratus. Furthermore, it predicted 



7 3% of the changes correctly. 
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The results of the validation for 1961-1962 of the two 



fits in Exercises A-l and A-2 are combined in the following table. 



0 + 0 
0+1 
1 + 0 
1 + 1 



Predict ion 

Success Failure 

88 

6 16 

16 6 

29 14 



Fraction Correct 
0.9 3 
0 . 2*7 

0 .*73 

0 . 6*7 



The threa t score for predicting changes from 0 1 is 



'0 + 1 



N 0 + l + F 0 + 0 



( 6 + 1 6 + *7 ) 



= 0.21 



(A-2) 



where 

the number of correct predictions of change from 

0 + 1 , 

the total actual number of changes from 0+1, 

the number of incorrect predictions of no change 

0 + 0 . 

Similarly the threat score for predicting changes f rom 
1 + 0 is 



C 



0+1 



0+1 



0 + 0 



t = C l*0 = 16 

1 N n +F. , 16+6+14 

1+0 1+1 



0.44 



(A-3) 



The threat score for predicting all changes 



The fraction of correct predictions using the logistic 



models is 



88+6+16+29 

182 



0 . 7 6 . 



The fraction of correct predictions using persistence is 

88 + 7 + 29 + 14 „ __ 

182 = °* 6 * 



Of course persistence predictions will never be correct when a 
change takes place, while the methods just presented, and others, 
may actually do quite well and seem worth the extra trouble. 



Model A- 3 : Prediction based on all data. The variables selected 

are: a constant, JZ. n ( A + 1), and H t _^. The fit is 



x_ = 6. 7 3 - 3.39 X, n ( A + 1) + .225 H ^ ^ 

(1.30) ( 0 . 5 7 0 ) ( 0 . 0 5 0 7 ) 



where A = A + 1 . 

Again numbers in parenthesis are standard errors. 
The cross-validation results are below: 



0 + 0 

0+1 
0+0 
1 + 1 



1961-1962 
Pred ict ion 

Success Failure 

91 4 

5 l 7 

16 6 

30 1 3 



Fraction Correct 
0.96 
0.23 
0 . 7 3 
0. 7 0 
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Fraction Correct 



91+5+16+ DO 
182 



u .7a 



Fraction Correct Persistence 



91 + 4 + JO + 15 
182 



U . 7b 



The threat scores for predicting chanyes are 



T 



5 



U . 19 



U 



5+17+4 



T 



16 



0.46 



1 



16+6+13 



TT 



16 + 5 



0 . 34 



16+5+17+6+4+13 



The threat scores tor the fit using all the data are about 



the same as those tor the separate tits, i.e. those that condition 
on whether or not stratus existed on the day before. The fraction 
of correct predictions of stratus and no stratus is also about the 
same as that tor the two separate tits, and tor prediction by 
persistence. We conclude that doing separate fits based on 
whether or not there is stratus the day before may not be 
profitable; a single logistic model may do as well as two. 
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APPENDIX B 



Shrinkage 

The term "shrinkage" is used in connection with tne 
following phenomenon: a regression model fit by maximum likeli- 

hood (or least squares) to one set of data which is then used 
for prediction on another set of data nearly always fits the 
new set of data less well tnan it does the original set. Copas 
(1982) points out that shrinkage can be more pronounced it the 
original regression fit is made with the aid of a stepwise pro- 
cedure; the latter tends to overlit. He suggests using the 
following logistic model lor binary prediction: 

P { Y = i | exp lanatory variable _x) 

n 

e x p { 8 + K ) 8 ' ( x . - x . ) } 

^ 0 ill l 

l = 1 /ii 

= (b-1) 

n 

1 + exp{8^+K V 6*(x.- x )} 
i = l 

where is the mean of the i t ^ 1 explanatory variable tor the 

original data. { B * } are the MLt' estimators for tiie original 
data and K is a shrinkage parameter; K = 1 means that there 
is no shrinkage. Data-derived prescriptions can be found for 
K, but in the exploratory work reported here we have found several 
numerical trial values and taken note of their general effects. 

The stepwise regression procedure of BMUP was used to 
fit a logistic model to data from 1958-61). This model with, and 
without shrinkage was then used to predict the occurrence of 
stratus in the years 1961-62. Tables J and 4 give the results 
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of the cross validation. Note that shrinkage slightly improves 
the prediction of no stratus on the following day. 

Tables 5 and 6 give the results of fitting logistic models 
to the data from 1958-60 and using the models with and without 
shrinkage to predict stratus in 1961. Four different models were 
used. The parameters of the models are as follows 

Parameters 

constant, £n A fc , y t _| 

constant, £n A. , y t , , V , V 
t 1 1 - 1 x y 

constant, Jin A ^ , NS^, NNS^ , 

constant, Jin A . NS , NNS t , H . , V , V 
t f t t -1 x y 

where A is the dew point depression plus 1. 

The models were fit using maximum likelihood. Stratus was 
predicted on day t if the forecast probability of stratus was 
greater than or equal to a . The cutoff point a was taken to 
be 0.5, or alternatively 0.41, the fraction of days of stratus 
during the years 1958-1960. 

Tables 1 and 8 give similar results for the models tit to 
data in 1958-61 and validated on 1962 data. The cutoff point a 
was taken to be 0.5, or alternatively 0.3 7 , the fraction of days 
of stratus during the years 1958-61. 

Tables 9 and 10 give the threat scores for the prediction 
of changes (equations A-2, A-3, and A-4 ) . 

The simplest model A with a cutoff of 0.5 seemed to do as 
well as any of the more complicated models. The rise of the 



A 

AW 

Mode 1 

B 

BW 



historical fraction of stra 
of changes, but not in all 
often seemed to improve pre 
stratus but again not unifo 
shrinkage did as well as th 
shrinkage . 



tus days sometimes improved 
cases. The use of shrinkage 
diction of changes from stra 
rmly. Models A and B with 
e stepwise BMDP procedure wi 



predict ion 
once again 
tus to no 
no 

th no 
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Table 3 



Validation on 61-62 ot BMUP Stepwise Fit 
Using all Data 58-60 with shrinkage 



~ 


1 


u 


. b 




u 


. 5 




u . 


4 




trans it ions 


s 


F 


FC 


b 


1' 


FC 


S 


F 


FC 


b 


F 


FC 


0 -► 0 


yi 


4 


.96 


yi 


2 


. 98 


93 


2 


. 98 


9b 


0 


1.0 


U - 1 


6 


17 


. 23 


3 


19 


. 14 


3 


19 


.14 


1 


21 


.05 


1 - U 


16 


6 


.73 


17 


5 


.77 


17 


5 


. 77 


17 


b 


. 77 


1 - 1 


30 


13 


. 70 


2b 


1^ 


. 60 


26 


17 


. 60 


22 


21 


.51 



Validation on 1961 ot bMDP Stepwise Fit 
Using all Data 58-6U with shrinkage 



K 


n 




r - 


1 




u 


. b 


! 


u . 


5 


, 

j 


0 . 


4 


i 


transit ions 


s 




FC 


b 


F 


FC 


s 


F 


1 

FC 


b 


F ! 


1 FC 


0 


0 


53 


3 


. 95 


54 


2 


.96 


54 


2 


. 96 


56 


1 

U 


i 


0 -*• 


1 


3 


b 


. 27 


3 


b 


. 27 


3 


b 


. 27 


1 


1U 


.09 


1 - 


0 


8 


i 


.73 


y 


2 


.82 


y 


2 


. 82 


9 


2 


. 82 


1 


1 


9 


4 


.69 


b 


5 


.62 


b 


5 


.62 


7 


6 


. 54 



FC = traction correct predictions 
S = number of successful predictions 
F = number of unsuccessful predictions 
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Table 4 



Validation on 1062 of bMDP Stepwise Fit 
Using all Data b8-bl with Shrinkage 



K 


1 


u 


. b 




0. 


b 




u . 


4 




trans itions 


s 


Jr’ 


FC 


s 


F 


FC 


b 


F 


FC 


b 


F 


FC 


0 


-4 


u 


SB 


1 


.97 


by 


u 


1 




u 


1 


6 y 


0 


1 


u 


-4 


1 


2 


y 


. 18 


u 


11 


0 


u 


11 


0 


0 


11 


0 


i 


-4 


u 


8 


b 


. 7j 


b 


3 


. 16 


b 


b 


.16 


8 


6 


. 16 


1 


-4 


1 


21 


y 


. 70 


lb 


12 


. bU 


lb 


14 


. b 6 


11 


l y 


. 17 



Model has explanatory variables 


constant 


£n (A ) 


H 

t-1 


Est coefficients 


b . 71 


-3.42 


0.242 


(Std. Errors ) 


(1.12) 


(0.489) 


( 0 . U4b4 ) 









Validation on 10b2 of Separate BMDP stepwise Fits 
For Data Points with Stratus or No Stratus the Day 
Before Usiny data of lybB-61 with Shrinkage 



K 


1 


u 


• b 




u . 


5 




U . 


4 




transitions 


b 


F 


FC 


b 


F 


FC 


b 


F 


FC 


b 


F 


FC 


0 


-4 


u 


bb 


1 


. y7 


by 


U 


i 


by 


U 


1 


by 


U 


i 


0 


-4 


1 


2 


y 


. 18 


u 


1 1 


u 


u 


1 1 


u 


u 


1 i 


u 


1 


-4 


u 


b 


b 


. 73 


b 


b 


. bb 


b 


b 


. 4b 


b 


b 


. 73 


1 


4 - 


1 


^1 


y 

1 


. 70 


22 


b 


.73 


24 


b 


. bU 


2b 


4 


. 8b 



Explanatory variables tor moue 1 with no stratus the day before. 

constant £ n ( A ) V 

t y 

Est Coefficients 6.0b -3.43 - 0 . 0 8 L 

(Std. trror) (0.42) (U.blU) (U.U4b) 



Fxplanatory variables tor model with stratus the nay before. 



Est Coefficients 
( std . trror ) 



cons tant 

b . 71 
( 1 .82 ) 



£n ( A ) 
t 

- i . b b 

( u.biy ) 



u . yy l 

( 0.0829 ) 
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Table 5 



Validation on 1961 of Predictions with Shrinkage 
of Models Fit with MLE Using all Data from 1958-60 



Shrinkage 


Mode 1 




A 












Mode 1 




AW 




Parameter 


Cutoff 


0. 


5 




u 


.41 






0 


.5 




u 


.41 






A\F 


1 


0 


PC 


1 


u 


PC 




1 


0 


FC 


1 


u 


PC 




1 


14 


1U 


. 58 


17 


7 


.71 




11 


13 


.46 


17 


7 


.71 




u 


7 


6U 


. yu 


13 


b 4 


.81 




10 


57 


.85 


12 


55 


.82 


K = 1 


transit ions 


s 


F 




s 


F 






s 


F 




b 


F 






0 U 


b 3 


3 


. 9b 


48 


8 


. 8b 




bU 


b 


.89 


48 


8 


. 8b 




0+1 


3 


8 


. 27 


b 


6 


. 4b 




3 


8 


. 27 


b 


6 


. 4b 




1 0 


7 


4 


. 64 


b 


b 


. b 4 




7 


4 


. 64 


7 


4 


.64 




1 1 


11 


2 


.8b 


12 


1 


. y 2 




8 


b 


. 62 


1 2 


1 






A\ P 


1 


u 




i 


u 






i 


U 










1 


11 


13 


.46 


14 


1U 


. b 8 




10 


14 


.42 


15 


y 


. b 3 




u 


4 


63 


. ^4 


7 


6U 


. 8y 




4 


b 3 


. y4 


11 


bb 


. 84 


K = U . b 


transit ions 


S 


F 




s 


F 






s 


F 










u + u 


b 4 


Z 


. y b 


b 6 


2 


.9b 




54 


2 


. 96 


49 


7 


.88 




U + 1 


3 


8 


.27 


6 


8 


.27 




3 


8 


.27 


5 


b 


.4b 




l + u 


y 


2 


. 82 


7 


4 


. 64 




9 


2 


.82 


7 


4 


.64 




1 + 1 


8 


b 


. 6 2 


11 


2 


.8b 




7 


b 


^54_ 


1U 


3 


.77 




A\P 


i 


u 




i 


0 






1 


U 




1 


U 






1 


8 


lb 


. 33 


14 


lu 


. 58 




8 


16 


. 33 


12 


12 


. 50 




u 


4 


63 


.94 


7 


60 


.89 




4 


63 


.94 


11 


56 


. 84 


K = O.b 


trans i t ions 


S 


F 




















U -»■ U 


b4 


2 


.96 


53 


3 


. 95 




54 


2 


.96 


4y 


7 


. 88 




U 1 


3 


8 


.27 


3 


8 


.27 




3 


8 


.27 


4 


7 


.31 




1 -»■ U 


9 


2 


.82 


7 


4 


.64 




9 


2 


. 82 


7 


4 


. 64 




.1 + 1 


5 


8 


. 38 


11 


2 


.85 




5 


8 


. 38 


8 


b 


= i 62 




A\F 


1 


U 




1 


0 






1 


0 




i 


0 






1 


b 


19 


.79 


14 


10 


. 58 




6 


18 


. 2b 


ii 


13 


.85 




U 


U 


b7 


1 


7 


60 


.90 




1 


66 


. yy 


8 


59 


. 88 


K = U.4 


trans i t ions 


s 


F 




s 


F 






s 


F 




S F 






U -*• u 


bb 


U 


1 


53 


3 


.93 




b6 


0 


i 


b 2 


4 


. 93 




U + 1 


1 


10 


. U 9 


3 


8 


. 27 




2 


y 


. 18 


3 


8 


.27 




1 u 


11 


0 


1 


7 


4 


. 64 




lu 


i 


.91 


7 


4 


. b4 ; 




1 - 1 


4 


y 


.31 


11 


2 


.85 




4 


y 


.31 


8 


b 


.62, 



PC = traction correct predictions 
. _ Number of Days of stratus in 1958-19bU 

U ‘ Total Number of Days in 1958-I9b0 

Explanatory variables in Model A = constant, y fcn(A^). 

Explanatory variables in model aw = constant, y fcn(A t ), 




V . 

y 



Table b 



Validation on lybl ot Predictions with shrinkaye 
of Models Pit with MLB Using all Data from 1938-6U 



Shrinkage 

Parameter 


Model B 






Mode 1 


bw 




Cutoff 




0 


.3 




u . 


41 






0 . 


5 




u 


.41 






A\F 


1 


0 


PC 


l 


u 


PC 




1 


0 


FC 


1 


u 


FC 








l 


13 


11 


. 54 


16 


a 


. bl 




13 


11 


. 34 


16 


8 


.67 








u 


b 


bl 


.91 


12 


55 


.82 




8 


59 


. 88 


12 


35 


.82 


K = 1 


transit ions 


s 


F 




b 


F 






s 


F 




b 


F 






0 


-* 


u 


53 


3 


. 95 


48 


8 


. 8b 




32 


4 


.93 


48 


8 


.86 




0 


■> 


1 


3 


8 


. 21 


4 


7 


. 3b 




3 


8 


. 27 


3 


6 


.43 




1 


-> 


u 


8 


3 


. 73 


7 


4 


. b 4 




7 


4 


. b4 


7 


4 


.64 




1 


-> 


1 


1U 


3 


. 77 


12 


1 


. y2 




10 


3 


. 11 


11 


2 


.83 




A\P 


1 


0 




i 


u 






1 


U 














1 


11 


13 


. 4b 


13 


11 


. 54 




11 


13 


.4b 


14 


lu 


.58 








u 


4 


63 


. 94 


y 


58 


.87 




4 


b 3 


. 94 


11 


5b 


.84 


K = U. 6 


transit ions 


s 


F 




b 


F 






b 


F 










u 


-+ 


u 


5 4 


2 


. yb 


51 


5 


.yi 




54 


2 


. yb 


49 


7 


. 88 




u 


->* 


1 




8 


. 27 


3 


8 


. 27 




3 


8 


• Zl : 


3 


8 


• 27 




1 


-> 


u 


y 


2 


. 82 


7 


4 


. b4 




y 


2 


. 82 


7 


4 


.64 




1 


> 


1 


8 


5 


. 62 


_lz^ 


_1 


♦ y2 




8 _ 


_5 


_^_6 2 


11 


2 


= .J8 5 




a\p 


i 


U 




1 


U 






i 


U 




1 


U 










1 


ii 


13 


.4b 


13 


ii 


. 54 




1U 


14 


. 42 


13 


ii 


. 34 








u 


3 


b 4 


.9b 


7 


bU 


. yu 




3 


b 4 


. y6 


11 


3b 


. 84 


K = U . 3 


trans i t ions 


b 


F 




b 


F 






b 


F 




b 


F 






u 


-> 


u 


85 


1 


. y8 


53 


3 


. 95 




55 


1 


. ya 


4y 


7 


. 88 




U 


-> 


1 


3 


8 


. 27 


3 


8 


. 27 




5 


8 


.27 


3 


8 


. 27 




1 


> 


u 


y 


2 


• 82 


7 


4 


. b4 




y 


2 


.82 


7 


4 


. 64 




1 


->* 


1 


8 


5 


. 62 


10 


3 


. 77 




7 


6 


^54 


J.U 




..J77 




a\P 


i 


U 




1 


0 






i 


U 




1 


u 










1 


7 


17 


. 29 


13 


ii 


. 54 




6 


18 


. 25 


12 


12 


. 3U 








u 


2 


b 5 


.97 


b 


bl 


. 91 




2 


b 5 


. 91 


8 


39 


. 88 


K = U . 4 


trans i t ions 


b 


F 




b 


F 






S 


F 




b 


F 






U 


-y 


u 


5b 


U 


1 


33 


3 


. y5 




56 


U 


1 


52 


4 


.93 




U 


-y 


1 


1 


1U 


. uy 


3 


a 


. 27 




1 


1U 


. uy 


3 


8 


• 21 , 




1 


-y 


u 


y 


2 


. a 2 


a 


3 


. 73 




y 


2 


. 82 


7 


4 


. 64 




1 


-y 


1 


b 


7 


.4b 


lu 


3 


. 77 




5 


8 


. 38 


y 


4 


.69 



PC = traction correct predictions 
4 ^ Humber ot bays ot stratus in 1938-196U 
Total Number ot bays in lybb-iybu 

explanatory variables in Model b = constant, NS^, nns^, H £n(A t ) 

explanatory variables in Model bw = constant, NS , H t _ i » £n(A ), V^, V 



Table 7 



Validation Using 1962 ol Predictions Using Shrinkage 
and Models Fit with MLS Using all Data trom 1968-61 



Model A Model Aw 



Shrinkage 

Parameter 


Cutot t 




u 


.5 






U . 


37 






u . 


5 




u . 


37 






a\f 


1 


u 


FC 


1 


0 


FC 




1 


u 


FC 


1 


u 


FC 








1 


26 


15 




. 6 3 


29 


12 


.71 




2b 


16 


. b 1 


29 


12 


.71 








u 


4 


4b 




.93 


8 


42 


. 84 




4 


4b 


.92 


7 


43 


,8b 


K = 1 


transitions 


b 


F 




b F 






b 


F 




b 


F 






u 




u 


38 


i 




.97 


36 


3 


.92 




38 


1 


.97 


37 


2 


. 95 




u 




1 


2 


y 




.18 


3 


8 


. 27 




2 


9 


. 18 


3 


8 


.27 




1 




u 


8 


3 




. 73 


6 


5 


. 55 




8 


3 


.73 


6 


b 


. 55 




1 


-> 


1 


24 


6 




. 8U 


26 


4 


_^87_ 




23 


7 


_^_77 


2 6 ^ 


__4 


=a 87 




a\f 


i 


0 




1 


u 






1 


U 




1 


U 










1 


15 


26 




. 37 


28 


13 


. 6 8 




15 


26 


.37 


26 


15 


.63 








u 


2 


48 




. 96 


6 


44 


. 88 




2 


48 


. 96 


5 


45 


. 9U 


K = U.6 


trans i t ions 


s 


F 




s 


F 






s 


F 




b 


F 






U 


->■ 


u 


39 


u 


i 




38 


1 


.97 




39 


0 


1 


38 


i 


.97 




U 




1 


U 


11 


u 




2 


9 


.18 




U 


11 


U 


2 


9 


.18 




1 




u 


9 


2 




.82 


6 


5 


. 55 




9 


2 


.82 


7 


4 


.64 




1 




1 


15 


15 




. 5U 


26 


4 


.87 




15 


_15 


^5U 


= 2 4 = 


_ = 6 


=i 87 




a\f 


1 


u 




i 


U 






1 


U 




1 


U 










1 


15 


26 




. 37 


26 


15 


.63 




8 


33 


. 2U 


26 


15 


.63 








u 


2 


48 




. 96 


4 


46 


.92 




1 


49 


.98 


b 


45 


. 9 U 


K = U . 5 


trans it ions 


b 


F 




b 


F 






b 


F 




b 


F 






U 


->■ 


u 


39 


u 


1 




38 


i 


. 97 




39 


U 


1 


38 


1 


.97 




U 




1 


U 


11 


U 




2 


9 


.18 




U 


11 


U 


2 


9 


. 18 




1 




u 


9 


2 




.82 


8 


3 


. 7j 




1U 


1 


.91 


7 


4 


.64 




1 




1 


15 


15 i 




♦ 5U 


24 


_6j 


. 8 U 




8_ 


2 2 


. 27 


_2 4__ 


== 6 


=i 87 




a\ f 


1 


u 




i 


u 






1 


U 




1 


U 










1 


7 


34 




.17 


26 


15 


. 6 3 




6 


3b 


.15 


25 


lb 


. 6 1 








u 


U 


5U 


1 


. UU 


4 


46 


.92 




U 


bU 


1 


4 


4b 


.92 


K = U . 4 


trans it ions 


b 


F 




s 


F 






b 






b 


F 






U 


->■ 


u 


39 


u 


1 




38 


1 


.97 




39 


u 


1 


38 


1 


.97 




U 




1 


U 


11 


u 




2 


9 


.11 




U 


11 


U 


2 


9 


.18 




1 


-f 


u 


11 


u 


1 




8 


3 


.73 




11 


u 


1 


8 


3 


.73 




1 


> 


1 


7 


23 




. 23 


24 


6 


. 8U 




6 


24 


. 2U 


23 


7 


. 77 



FC = traction correct 

_ Number ot Days ot stratus in 1958-1961 
Number ot bays in 1958-1961 

Model A explanatory variables : constant, y £n(A t ). 

Model AW explanatory variables: constant, y £n(A ), V^, 



V . 

y 



Table 8 



Validation Using 1962 Data of Prediction Using Shrinkage 
MLE Using all Data from 1958-61 and Models Fit with 



Shrinkage 


Model B 








Model l 


BW 




Parameter 


Cutoff 


u 


.5 




u . 


37 






0 


.5 




u 


.37 






A\F 


1 


0 


FC 


1 


0 


FC 




1 


u 


FC 


1 


0 


FC 




1 


23 


16 


.56 


27 


14 


.66 




23 


16 


. 56 


27 


14 


. bb 




u 


4 


4b 


.92 


6 


42 


. 64 




3 


47 


. 94 


7 


43 


• 8b 


K = 1 


transit ions 


S 


F 




s 


F 






s 


F 




s 


F 






0 + 0 


36 


i 


. 97 


36 


3 


. 92 




39 


U 


1 


37 


2 


.95 




U -v 1 


2 


9 


.16 


3 


6 


. 27 




2 


9 


. 16 


3 


8 


. 27 




1 + 0 


8 


3 


.73 


6 


5 


.55 




6 


3 


.73 


6 


5 


. 55 




1 -v 1 


21 


9 | 


.70 


_24_ 


_6 


= w 60 




_21_ 


9 




_24 


_ _._6 . 


_ .JB 0 




A\F 


1 


0 




1 


0 






1 


U 




1 


0 






1 


16 


23 


.44 


24 


17 


. 59 




16 


23 


.44 


24 


17 


.59 




u 


3 


47 


. 94 


6 


44 


. 66 




3 


47 


. 94 


6 


44 


. 88 


K = U . 6 


transitions 


s 


F 




s 


F 






s 


F 




s 


F 






U * U 


39 


0 


1.00 


36 


i 


.97 




39 


0 


1.00 


38 


i 


.97 




U + 1 


0 


11 


0.00 


2 


9 


.16 




0 


11 


0.00 


2 


9 


.18 




1 + 0 


6 


3 


. 7 3 


6 


5 


. 55 




6 


3 


.73 


6 


5 


. 55 




1 + 1 


16 


12 


.60 


22 


6 


.73 




16 


12 


. 60 


22 


8 


.73 




A\ F 


1 


u 




1 


0 






1 


u 




1 


0 






1 


17 


24 


.41 


24 


17 


. 59 




15 


26 


. 37 


24 


17 


. 59 




U 


3 


47 


. 94 


b 


44 


. 66 




3 


47 


. 94 


5 


45 


. 90 


K = U.b 


transitions 


s 


F 




S 


F 






s 


F 




s 


F 






0 + 0 


39 


0 


1 .00 


38 


1 


.97 




39 


U 


1 


38 


1 


. 97 




0 + 1 


0 


11 


0.00 


2 


9 


.16 




0 


il 


0 


2 


9 


.18 




1 + u 


8 


3 


.73 


b 


5 


. 55 




6 


3 


.73 


7 


4 


.64 




1 + 1 


17 


13 


. 57 


= 22_ 


_8 


= w 73 




15 


Ak 


_._50_ _ 


2 2^ 


_ _JB 


= •=14 




A\ F 


i 


u 




1 


0 






1 


u 




1 


U 






1 


10 


31 


. 24 


24 


17 


. 59 




7 


34 


.17 


23 


18 


. 56 




U 


3 


47 


. 94 


b 


45 


. 90 




3 


47 


. 94 


4 


46 


. 92 


K = U.4 


trans i t ions 


s 


F 




S 


F 






S 


F 




s 


F 






U + U 


39 


0 


1 .00 


38 


1 


. 97 




39 


0 


1 .00 


38 


i 


.97 




U + 1 


U 


1 1 


0.00 


2 


9 


.16 




0 


11 


0.00 


2 


9 


. 18 




1 + 0 


8 


3 


.73 


7 


4 


. 64 




8 


3 


. 73 


8 


3 


. 73 




1 + 1 


1U 


20 


.33 


22 


8 


.73 




7 


23 


. 23 


21 


9 


.70 



FC = traction correct predictions 

Number of Days of stratus in 1958-1961 
Number of Days in 1956-1961 

Model B explanatory variables: constant, NS^., NNS , H £n(A ) . 

Model BW explanatory variables: constant, NS., NNS.. , H ., ln(Aj, V 

t t t- 1 t x 



Table 9 



Threat Scores for 1961 Validation 
of Models fit with MLE on 
data from 1958-1960 



Mode 1 


A 


AW 


B 


bw 


Outpoint 


0.5 


0.41 


0.5 


0.41 


0.5 


0.41 


U . 5 


0.41 




T 

0 


0.21 


U . 26 


0.18 


0.26 


0.21 


0 .21 


0 . 20 


0 . 26 


* 

II 

I— 1 


h 


0 . 54 


0.50 


0.44 


0 . 58 


0 . 57 


0.58 


0 . 50 


0.54 




TT 


0.37 


0.35 


0 . 30 


0 . 39 


0 .39 


0 .35 


0 . 34 


c 

• 

U' 

00 




T u 


0.23 


0.21 


0 .23 


0 . 28 


U . 23 


0 . 19 


0 . 23 


0.17 


K = 0.6 


h 


0 . 56 


0.54 


0 .53 


0 . 50 


U . 36 


0.58 


0 . 56 


0 . 34 




TT 


0.41 


0.37 


0 .40 


0 . 38 


U . 41 


0 .36 


0.41 


0 .22 




T o 


0 . 23 


0.21 


0 . 23 


0. 22 


0 .25 


0.21 


0 . 25 


0.17 


K = 0.5 


T 1 


0.47 


0.54 


0.47 


0.44 


0 . 56 


0. 50 


0.53 


0 . 50 




TT 


0 .38 


0.37 


0 . 38 


0 .32 


0.43 


0 .36 


0.41 


0 . 31 




T 

0 


0 .09 


0.21 


0.18 


0 . 20 


0 .09 


0.21 


0 .09 


0 . 20 


K = 0.4 


h 


0 . 55 


0.54 


0 .50 


0.44 


0 . 50 


0.57 


0.47 


0.47 




TT 


0.39 


0.37 


0.39 


0.32 


0.34 


0.39 


0 . 33 


0 .33 



Model 



A 

AW 

B 



BW- 



Explanatory Variables 

constant £n(A^) 

constant An (A. ) y. , V V 
t x t-1 x y 

constant £n(A t ) NS t NNS fc H t-1 

constant £n(A. ) NS NNS H. , V V 

L L L L "" 1 x y 



No. days of stratus during 1958-60 
No. days during 1958-60 



Table 1U 



Threat Scores for 1962 Validation 
of Models fit with MLS on data 
from 1956-1961 





A 


AW 


B 


BW 


Cutpoint 


0.50 


0.37 


0.50 


0.37 


0.50 


0.37 


0 . 50 


0.37 


T o 


0.17 


0.21 


0 . 17 


0.23 


0 .17 


0.21 


0.16 


0.23 


K = 1 T 1 


0.47 


0.40 


0.44 


0 . 40 


0.40 


0 .35 


0 .40 


0 .32 


TT 


0.34 


0 .31 


0.33 


0.32 


0.31 


0 .29 


0.32 


0 . 30 


T 

0 


0 


0 .17 


0 


0.17 


0 


0 . 17 


0 


0.17 


K = 0.6 T 1 


0 .35 


0.40 


0.35 


0.41 


0.35 


0.32 


0 .35 


0 .32 


TT 


0.24 


0.30 


0 . 24 


0.31 


0 . 24 


0.26 


0 . 24 


0.26 


T 

0 


0 


0.1-7 


0 


0 .17 


0 


0 .17 


0 


0.17 


K = 0.5 T 1 


0 .35 


0 .47 


0.30 


0.41 


0 .33 


0 . 32 


0 .31 


0.37 


TT 


0 . 24 


0 . 34 


0.23 


0.31 


0 . 23 


0 . 26 


0 . 22 


0 . 29 


T 

0 


0 


0.17 


0 


0.17 


0 


0 .17 


0 


0 .17 


K = 0.4 T 1 


0.32 


0.47 


0.31 


0.44 


0 .26 


0.37 


0 . 24 


0.40 


TT 


0 . 24 


0 .34 


0 . 24 


0 .33 


0.19 


0.29 


0 .16 


0.31 



Model 



Explanatory Variables 



A 


constant 


£n(A ) 


y t-l 




AW 


constant 


A n < A t ) 


*t-l V x 


V 

y 


B 


constant 


fcn(A t ) 


NS NNb t 


H t-1 


BW 


cons tant 


£n ( A ) 


NS fc NNS t 


h t-l V x 


U . 3 7 = 


No. days 


of stratus during 


1956-61 


No. days 


during 


1956-61 





V 

y 



APPENDIX C 



Robust Estimation for Binary Logistic Regression . 

Maximum likelihood estimates are susceptible to outlying 
data points: they are unduly influenced by a few (exceptional) 

data points which may not agree with the assumed model. Pregibon 
(1982) suggests robust procedures which yield estimates that are 
resistant to a tew such exceptional data points. The procedure 
that has been used in this report is as follows. 

Let the deviance of point i be 



d i = -2 (y i Jin ^ + ( 1 — y ^ ) JlnU-p^)), i = 1,...,N 



(C-l ) 



where in the logistic model 



Pi = 



exp{ x 6 } 

— i 

1 + exp { x . 8 } 

— i 



( 02 ) 



and 



x.B = 8,, + B,x., + B„x.„ + ... + 8 x. 

— i— U 1 ll 2 1 2 p ip 



(03) 



th 



explanatory variable for the 



x., is the value of the k 
lk 

t* a 

i — data point, and 8. is the estimate of 8, > the regressioh 

K K 

coefficient for the k— explanatory variable, x^; k = l,...,p. 
The problem of finding the MLE estimators turns out to 



be to solve for 8y,...,8^ in the non-linear equations 



N 

l x 

i = l 



ik 



Xii 



1 + e 



x x i 



= 0 



(04 ) 



for k = l,...,p. 
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One possible robus t -res i s ta nt (insensitive to outliers) 
procedure is to find estimators gg,...,*? such that 



N 

l w ( i ) x 
i = 1 




e 



0 



( C-5 ) 





i + e 



where 



w ( i ) 




if 



otherwise , 



d . < H 

l — 



( C .-6 ) 




is the deviance of the i — data point and the fitted model 



at that point, from (C-l). 

A value of H = 1.35 was suggested by Pregibon and used 

for the tuning constant; if H = °° the procedure carries out 

the ordinary MLh fitting, while as H decreases the effects of 

extreme local deviance points have progressively less effect on the 

t h 

fitted model. Notice that the i — da ta -determined weight, w(i), 
is made relatively small if d ( i ) is large. Thus data points 
which are not well fit by the assumed model will tend to receive 
less weight than others that are. The resistant estimates, g_ , 
are found by iteration. First the MLR estimate is found and the 
initial weights computed. Then (C-5) is solved for {js (1), 

K 

k = l,.,.,p} by a New ton-Raphson procedure. New weights w (1) 

K 

are computed from (C-6). Then these are entered in (C-5), and it 
is solved for (8^(2), k = l,...,p}; this process repeats until 
the iterative estimates converge. 
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On each day either stratus occurs or not. If stratus 
occurs on consecutive days then a run of stratus days is said 
to occur. Let NS^. be the length of the run of stratus days 
that includes day t-1 . For example, NS fc =0 if the previous 
day had no stratus, so y ^ = 0; while 

NSj. = 2 if y t _i = 1* y t _ 2 = y = 0 . Let NNS fc be the 
length of the last run of no stratus days that includes day t-1. 

Table (11) gives the estimates for five iterations of the 
robust procedure applied to a model using 1958-1960 data. The 
explanatory variables are: constant, NS t , NNS fc , H t-1' £n(A t ). 

where A is the dew point depression plus 1. 

TABLE 11 

Results of Iteration of Resistant Procedure 



Number of 
I tera t ion 


Cons ta nt 


NS t 


NNS t 


H t-1 


in ( A ) 


0 (MLE) 


6.81 


-0.01 


-0.05 


0.21 


-3.34 


1 


9.30 


-0.04 


-0.05 


0.28 


-4.55 


2 


9.98 


uT> 

O 

• 

o 

1 


-0.04 


0.30 


-4.88 


3 


10.16 


-0.0 6 


-0.04 


0 .31 


-4.9-7 


4 


10.21 


o 

• 

o 

1 


o 

• 

o 

1 


0.31 


-5.00 


5 


10.22 


-0.0 6 


-0.0 4 


0.31 


-5.00 


Note that 


except for 


the estimated 


value of 


NNS t 



resistant procedure has made the estimates greater in absolute 
value. Such sharpening of the expression is a common occurrence 
when robust logistic procedures are utilized. 



We tit this model B robustly to iybb-60 data and then used 
the fitted model to predict the occurrence of stratus with a cutott 
point of 0.5. We also robustly fit model B to 19Sb-bl data and 
used it to predict the occurrence of stratus in ly62. Although the 
estimated parameters using the robust procedure were different, the 
results of the cross-validation were almost the same as with the 
maximum likelihood fit reported in Appendix B. Results of the 
cross-validations with models fit robustly appear in Table lb at 
the end of Appendix L>. 
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APPENDIX D 



Logistic Models with Updating 

Despite best attempts to develop a single model with which 
to predict stratus in any given year, the resulting model may 
sutler irom lack ot timeliness. The basic reason is that simple 
models titted with data trom one period may well not be entirely 
relevant to another, owing to changing conditions not represented 
in the model. One attractive procedure tor dealing with the lack 
ot timeliness issue is to progressively update the model fit so 
as to incorporate recent data, i.e. data representing conditions 
near in time to those to be forecast. This is the philosophy ot 
the well-known Kalman filter. In the present context the updating 
procedure has been carried out completely stra ighttorwardly , i.e. 
by simply re-computing estimates using recent data. Computation- 
ally economical and sophisticated methods remain to be developed. 

we report the results ot an investigation ot updated model 
tits to predict the occurrence ot stratus. Three updating schemes 
were tried. 

1. a model was initially tit using all data trom the previous 
year. Then a forecast ot the occurrence ot stratus was made using 
the model tor the tirst ten days ot the current (torecast) year. 
These ten days were then added to the torecasting data set, and 
the eldest, or initial, ten days ot data were dropped. The model 
was re-tit using the updated data. Using the new model, the oc- 
currence ot stratus the next ten days of the current year was 
forecast. Then the second-eldest ten-day s-worth ot data were 
dropped, and the newest ten days were added, and the mode f was 



4 1 



re-fit, forecasts made, and so the process was continued. This 
may be referred to as a 90-day rolling forecast in steps of ten 
days . 



II. A model was initially fit using all data from the previous 
year. A forecast for the occurrence of stratus was made for the 
first day of the current year. This data point was added to the 
forecasting data set, and the eldest point deleted. The model was 
refit using the altered modeling data set. A forecast of the occur 
rence of stratus was made for the next day of the current year. 

This data point was added to the modeling data set and the oldest 
point was dropped, and so forth. This is a rolling forecast in 
one-day steps. 

III. Same as II but the initial modeling data set includes only 
the last 45 points of the previous year. 

Two different sets of explanatory variables were tried, 

A and B with and without wind speeds, where 

A: constant, y £n(A ). 

AW: constant, y^ ,, in (A.), V (t), V (t) 

t- 1 t x y 

B: constant, NS, NNS^, H^_^, £n(A ) 

BW: constant, NS fc , NNS fc , £n(A fc ), V^(t), V (t) 

as before; A is the dew point depression plus 1. 

A prediction of stratus was made if the forecasted proba- 
bility was greater than a . In most cases a = 0.5. Additionally 
a was sometimes taken to be the fraction of the number of days 
of stratus over all years previous to the current year. 

The results are summarized in Tables 12-14 of threat scores 
(T () ,T TT) and fraction of correct predictions ( FC ) . For com- 
parison purposes results are also given for prediction without 



updating. trull tables ot the numbers ot correct and incorrect 
predictions can be tound in Tables lb-18. 

As stated previously, the cutott point, a , for the up- 
dating procedures was either U.b, or alternatively, the historical 
fraction ot days ot stratus. tor the simpler model A, the use ot 
the historical traction appeared to improve prediction ot stratus, 
but to worsen the prediction of no stratus. Using robust estimates 
in updating procedure I gave the same results as using the simpler 
MLh estimates. The more complicated model b often (but not always) 
improved predictions ot changes. Adding information about winds to 
either model A or b never improved prediction much. Using shrinkage 
with the updating procedure it once again tended to improve pre- 
diction of changes t rom stratus to no stratus, but tended to worsen 
prediction ot a change t rom no stratus to stratus. Updating 
procedure 111 otten seemed to do better in predicting changes from 
no stratus to stratus than updating procedure 11; however, it did 
worse when predicting changes from stratus to no stratus. Updating 
procedure I always did at least as well as in predicting changes 
from stratus to no stratus but sometimes not as well as III in 
predicting changes t rom no stratus to stratus. Model b with an 
updating procedure otten did better than Model A with updating 
particularly in predicting changes t rom no stratus to stratus. Ln 
summary, models with updating sometimes did better than models with 
no updating, but the improvement was surprisingly small. 



Table 12 



Threat Scores tor Changes and Fraction of Predictions 
Correct for ly61 Predictions 

based on Models With and Without Updating 



Model A 



Data Used 
to Fit Model 




1958- 


-1960 




1960 
















AW 

1958-1960 


Updat ing 




NO 




NO 




I 




II 




III 




NO 


Method 




MLE 




MLE 




MLE 




MLE 




MLE 




MLE 


a 




0.5 


0.41 




0.5 




0.5 


0.41 




0.5 




0 . 5 




0.5 


0.41 


T o 




0 . 21 


0 . 26 




0.26 




0.25 


0.39 




0.25 




0.29 




0.18 


0 . 26 


T 1 




0.54 


0.50 




0 . 56 




0.53 


0 . 50 




0.50 




0.50 




0.44 


0.58 


TT 




0.37 


0.35 




0.40 




0.39 


0.44 




0.38 




0.39 




0.30 


0.39 


FC | 




| 0.81 


1 0 • 78 | 


II 


0.77 j 


1 1 U . 79 | 


| 0 . 80 | | 0.78) | 0.78 




| 0.75| 0.79 



Model B 



Data Used 
to Fit Model 




1 

1958-1960 




1960 
















BW 


Updat ing 




NO 




NO 




I 




11 




III 




1958- 


-1960 


Method 




MLE 


Robust 




MLE 




M LE 


Robust 




MLE 




MLE 




MLE 


a 




0.5 


0.41 


0.5 




0.5 




0.5 


0.5 




0 . 5 




0.5 




0.5 


0.41 


T o 




0.21 


0.21 


0.21 




0 . 35 




0.43 


0.43 




0 . 29 




0.29 




0 . 20 


0.26 


L 




0 . 5 7 


0.58 


0 . 5 7 




0.56 




0.60 


0.60 




0.56 




0.44 




0.50 


0 . 54 


TT 




0.39 


0.35 


0 .39 




0.45 




0.52 


0.52 




0.43 




0.36 




0.34 


0.38 


FC 




0.81 


0 . 78 


0.81 




0.80 




0.84 


0.84 




0.81 




0.77 




0.79 


0 . 78 



Model A has explanatory variables: constant, y fcn(A ) 

Model H has explanatory variables: constant, NS^, NNS t , H t _^, £n(A t ) 
Fraction correct using persistence is 0. 7 6 
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Table 13 



Threat Scores for changes and traction of predictions correct tor 1962 

Predict ions 

Model A 



Data Used 
to Fit Model 




iys«- 


-iy6i 




1961 
















AW 
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Model A has explanatory variables: constant, y £n (A t ) 

Model b has explanatory variables: constant, NS^, NlNS^, H t _^, £n (A^.) 
Fraction correct using persistence is U.7b 
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T a b 1 e lb 



Validations For Rolling Fits 
(10 days at a time initiating with only the 
previous year ) 
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number ot days ot stratus during all previous years 
number ol days in all previous years 

xiel A explanatory variables: constant, y £n(A t ) 

3del b explanatory variables: constant, Nb^, NNS^, H ( _^, £n(A t > 



Table lb 



one year Validations tor Updating M Lb fits 
of models for one day ahead and dropping 
the oldest day (cutoff = U.b) 



Mode 1 




b: entire previous year used to tit initial model 

H: halt previous year used to tit initial model 

'lode 1 a explanatory variables: constant, Y t _j/ 

Model b explanatory variables: constant, Nb t , NNb^, tn(A t ) 

tC - traction correct predictions 






Table 17 



Validation ot Updating of Model B with Shrinkage. 

The Model was initially fit with entire previous 
year and one point from new year added and oldest 
point dropped in each iteration. 
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FC = fraction correct 

Model B explanatory variables: constant, NS , NNS^_ , £n(A fc ). 

Cutoff = 0.5 . 
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Table 18 



Validations tor MLB tits without updating based 
on difterent amounts ot historical data 
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Cutott point = 0.5 . 

Model B fit robustly to data 1958-60 and cross-validated on 1961 yives 
the same results as MLB. 

Model B fit robustly to data 1998-61 and cross-validated on 1962 yives 
the same results as MLB except in the cases * and +; tor * tne corres- 
ponding numbers ace 9 and 49; tor + the corresponding numbers are 7 and 4. 
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APPENDIX E 



Survival Models ; Relation to the Logistic Representation . 

E . 1 Preliminary Models 

Suppose a system occupies one ot two states tor a varying 
("random") time period, then switches to the other, and back. 
Such events occur at times t = 0 , 1 , 2 , 3 , . . . . Such is the case 
with the stratus-no stratus fluctuation that has been studied, 
but is also true of many other weather-related events, rainfall- 
no rainfall being a prime example. 

We discuss several traditional stochastic models as a 
pre 1 imi na ry . 

Model 1 : Markov Chain 

Let Y denote the state variable of the system at time 
t. Suppose (here i,j = 0,1) 

P{Y t =j | Y t _i=i) = p i3 > 0 ; (E-l) 

in particular, no further past history is useful: 



HVh V 1 = i 'V2 =a ' Y t-3 =b VC k ---> (t>2) 



tor all i,j and all t . 

There is then a long-run or steady-state distribution 
{ tt , ti i } that satisfies balance equations: 



so 



*u p ol = ’ClO = U 'V P HI 



'10 



01 



0 P 10 + P 01 * 1 P 10 + P 0 1 



(E-3) 
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If such a model truly described nature, i.e. stratus level at 



an airport, then tt ^ could be referred to as the climatological 
probability of stratus , (Y t =l), on a day . Such a model may be 
fitted to data: one simply estimates p 1(J , for example, by 

the fraction of changes from 1 to 0 (stratus to no-stratus) 
observed in an observational period. The model does not have 
the capacity to incorporate physical parameters or explanatory 
variables, such as dewpoint depression. 

Model 2 : Two-State Renewal Process 

Let S represent the generic length of a stratus period, 
i.e. or number of days throughout which there is uninterrupted 
stratus ( Y = 1 ) . Just before S , and just after, there will be 
periods of one or more no-stratus days; let such a generic pe- 
riod be C (C denotes "clear"); (throughout the period 
Y = U). If {s.} is a sequence of statistically indpendent 
stratus periods from the same distribution, and { C ^ } is a 
collection of corresponding clear periods, then the time history 
of system state appears as below: 



Y 



t 



1 



• 0*90 



• O • • 




5 2 



In the long run. 



lim P{Y =1} 
t > 00 



E[S] 

E[S) + E[Cj 



Mean Length of Stratus Period (E-5) 

Mean Length ot Strat. + Mean Length of No-Strat. 

The above can be called the climatological probability of stratus 
on a day. Strictly, the two-state renewal process model stipulates 
that the sequence of stratus day periods {S_.} is one of inde- 
pendent, identically distributed random variables, as is the 
sequence of clear day periods {C^}; the two sequences are mutu- 
ally independent. The Markov chain model is a special case of 
the two-state renewal model in which stratus periods, generically 
S , have a geometric distribution with mean E(SJ , and the clear 
periods, C , have their, generally different, geometric distri- 
bution with mean E(C). 

Once again, this model contains no direct accounting for 
the possible influence of explanatory variables upon the proba- 
bilities of stratus state changes. 

E . 2 The General survival Model 

Suppose a forecaster is in action at time t . He easily 
notes the current system state; suppose Y = 0 , i.e. no 
stratus. He wishes to predict the system state at t + 1. A 
believer in Model 2 will act in an actuarial fashion, computing 
the conditional probability that the same state will prevail 
("survival" occurs), given that the current clear state has 
lasted tor d days: 
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(E-6) 



or 



-h (d+1) 

P { C _>d + 1 IC^d} = e 

p{Y t+r 1 |Y t = 0 ' Y t-i"° Y t-d + i'°' Y t-d 

- p(c<d+l|C>d} 
-h 0 (d + l) 

= 1 - e 



= 1 



( E - 7 ) 



Similarly, if stratus is now present (Y =1), 



p(Y t + i- 1 | V 1 ’Vi - 1 Y t-d + i = 1 ' Y t-d 



= e 



-h ^ ( d + 1 ) 



(E-8) 



the quantities h^ ( d ) , h^(d) may be referred to as the hazards 

associated with the states in question, for 
— h t ( d + 1 ) 

1 - e ~ h^(d+l) it h^(d+l) is small 

is the conditional probability, or, picturesquely, hazard , that 
a stratus period ot duration (" 1 i fe length " or "age") d actually 
"dies", or changes to a non-stratus period at age d+1. 

Similarly when a non-stratus period is in progress, the change 
occurs with hazard h^id+l). 

A promising enterprise is now to enhance the above forecast 
of survival, or death, at age d + 1 by further relevant - informa- 
tion about the physical environment of the process. Under present 
circumstances, i.e. when forecasting stratus, one might well use 
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dewpoint depression as well as previous days of stratus (or 

no-stratus). Other explanatory variables might well be appropri- 
ate, and can perhaps be identified from physical arguments aug- 
mented by graphical or other exploratory techniques. 

In order to utilize the hazard notion in a regression 
context it is convenient to put 

h Q = exp { x^B_q } (E-10) 

where for instance the vector of explanatory variables might be 

x = (1,NS ,A ,H ^t) ( E- 1 1 ) 

and 

Iq = ($ 01 , . . . , 3 0p ) ( E-12 ) 

is the required system of constants. A form such as (E-1U) can 
never be negative, a minimal requirement. Precisely the expres- 
sion (E-10) has been used by Cox (19 7 2) for describing hazards. 
Actually Cox's hazard is written as 

A(t)exp{^3) (E-13) 

Suppose observations are available on n days: these 

are of the form 



(y t ,X tl ,X t2' ’ * wX tp } ' 
where, as was mentioned earlier, possibly 

x ti = 1 ' x 1 2 = x t3 = H t-i' x 1 4 = NS t (i - e - # da y s of 

continuous stratus) . 
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Note that interactions and transformations can directly be 
included; e.g. simply put x t5 = x t 3 x t 2 = H t-1 x to 

represent an interaction term. 

Now the likelihood for the 6^ vector is 



L(£ n ;^x) = 



n 

n le 
t = l 



"V*t 



U-e 



"V x t 



l ~y, 



(t;-i4) 



takiny loys, we yet 

n -h ( x ) 

£(J3) = l ly t .h u (x t ) + (l-y t )£n[l-e ]] 

n 

= - l [y t explx^yJ + ( 1-y t ) £n ( l-exp{x t 3. (J ) ) ] (E-lb) 

and this can be maximized by choice of 3^ , a non-linear 
optimization task. The usual approach would involve differen- 
tiation with respect to 3^ , and solving the resulting non- 

linear system by a variation of the Newton-Raphson method. 

Package programs are available tor such a task. 

E . J The Logistic Model from Cox's Model 

Suppose a Cox model is under consideration tor describing 
the probability distribution of "age to death" or, in the present 
context, the survival of a stratus (or no-stratus) episode tor 
another day. In a simple form, the probability of survival 
through t + 1 in state j (j = i,U) given that tor the past m 
tune periods state j is in effect is 



bb 






Vcn-D^'W^'V V 



= e 



-h ( m+1 ) 
J 



= e 



xp [-X j ( t )exp{x t £} ] . (E-ib 



Ordinarily X _. ( t ) is thought of as a deterministic but 
unknown function of t , i.e. time since start of the process. 

In an application to stratus forecasting, and to other weather 
phenomena, it may be desirable to allow a dependence of the basic 
hazard rate upon m , the duration or "age" of the current epi- 
sode (stratus, or non-stratus as the case may be): X^(m). This 

necessitates a specification, either parametric or non-parametr ic ; 
the Cox procedure in Cox [1972] was to estimate X_.(m) non- 
para metrically. 

In order to associate the Cox model explicitly with the 
logistic, adopt the attitude that X..(m) is actually random , 
and is independently distributed from period to period, with a 
distribution characteristic of the state. In such a case we can 
do no better than to attempt to estimate the model 



FV(m,x t ) = E ( exp [ -X_^ ( m )exp { x^ B_) ] ) , 



(E-17) 



where the expectation operator E(*) is over the distribution of 

the now-random hazard. To be quite specific, allow X^(m) to 

have the Gamma distribution for a ,y . > 0 , 

3 3 




y ,dy * 



( E- 18 ) 



where a. and y. characterize the hazard variability when 
13 

state j is in effect. Now for this distribution the expectation 
is explicitly in terms of the Laplace transform: 



b~l 



-a .y y . 

e J (ay) 3 

Pj(m/X t ) = / exp [ -y exp{x t B_}] fTT”) Y j dy 



a • 



Y j 



a j + exp{x t 6_} 

Y • 



1 + i eit^- 

a . 

3 



(E-19) 



This is the probability of survival in state j for one more 
period (no change). 

Now the probability of a change is, using the above 
randomizing model, 

= 1 - 



. 1 -t- 

1 + — e 
a . 



(E-20) 



and, in case y ^ = 1 (mixing by an exponential) we find 



P(v t+ 1 *jlv t =j,x t =x t ) - 



-i 

a . e 
3 

, x -1 
1 + a . e 
3 



( E-2 1 ) 



which is precisely the logistic regression model. It is thus 
clear that the logistic regression model can arise from a plau- 
sible stochastic mechanism. Note that the derivation presents 
an alternative to the simple logistic model that incorporates one 
more parameter, thus possibly allowing for the better represen- 
tation of a wider range of binary response data than by the 
classical logistic. 
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L.4 The Cox Survival Model with Stable-Law Random Hazard. 



It is of interest to investigate other ways of introducing 
auxiliary randomness into the Cox proportional hazard survival 
model. This process considered here represents model parameter 
fluctuation from day to day (in the present application) tnat is 
not covered by the simple representation 

^ { ^ t + 1 = J | ^ t = j , X t =x_ t ) = exp [ - A exp{x_ t j3_}] ; (b-22) 

instead the form of the randomized model is obtained by inserting a 
term in the hazard: 



P{ y t + i = 3 I Y t = J ' -t = -t ' e t ) = ex Pt~ Ae t ex P{ x t-^ J * 

Now e is not directly observable or estimable if, as 
is assumed, only one observation on a probabi 1 i ty depending on 
each e is available. effectively one observes the marginal 
probability of = J • yi- ven and values of the explana- 

tory variables X^_: 

R { V fc + 1 = j | Y fc = j ,X_ t = x t ) = ( exp [ -A £ t expfx^}] ) . (b-24) 

Suppose now that e ^ obeys a positive stable law distri- 
bution (see Keller (lybb), p. 17U). In this case the Laplace 
transform of is always the form 

-se _ . y 

L [e t J = e (QS) , U < y < 1 . (b-2b) 

Unfortunately, explicit formulas tor the density of e are 
generally not available; that for y = 1/2 is an exception: 



t 

e 



( x ; a 
t 




1 



-a/lx 



/ 2n ( x/ a ) 



b/2 



by 



( b - 2 b ) 



It follows generally and directly from (E-24) and (E-25) 
that if e is positive stable the marginal probability of one- 
day survival is 

P{* t+i = J | * t = j ,X t = x t } = (exp [ -A e t exp{x_ t £}J) 

= exp[-Ua) T exp{x_ t y£}] , (E-27) 

once again exactly a Cox model (i.e. of the form (E-22)) but now 
with the parameters 



A ' = ( Xa) y , l' = ye . ( E-28 ) 

Thus the particular Cox model discussed is completely insensi- 
tive to the type of hazard randomization introduced here. Notice 
that the effects of the explanatory variables or covariates, 
x. , as measured by the magnitudes of their coefficients 
( £_ ->■ y , y < 1 ) > becomes progressively smaller as y u ; the 

latter "shrinkage" tendency is associated with greater and 
greater "spread" of the e t distribution (here "spread" cannot 
be measured by variance, tor the latter tails to exist). It 
follows that the predictive (in terms of explanatory variables) 
power of a Cox model could improve by reducing any tendency 
towards hazard randomization of the type exhibited, it such is 
poss i ble . 

Further work on randomized Cox models yielding binary time 
series will be reported elsewhere. 



t> u 



APPENDIX F 



Spectral Analysis of Hourly Stratus Levels and Dew-Point 
Depression for Ju ly-September 1958 . 

The data for the height of the stratus level are hourly 
records, in units of hundreds of feet, of the height of the 
stratus layer. There are 2208 such observations. The data is 
integer valued with a minimum of three and a maximum of 999; 

1410 of the observations are 999 which denotes the category of 
no stratus (infinite height); the next largest observational 
value is 888, of which there are 62; all the rest of the obser- 
vations are less than or equal to 180. 

Logarithms of the stratus heights were taken to reduce 
the range of the data. Figure (4) shows the £n (normalized 
periodogram) of the transformed data; (cf. Cox and Lewis (1966) 
pp . 99). If the data are uncorrelated and stationary then 
the values of the normalized periodogram will appear independent 
and have the unit exponential distribution. The line is at the 
95% quantile for the maximum of 1104 independent unit exponentials 
The largest peak occurs at 91. Other peaks occur at 1 and 2 7 6. 

The peak at 91 suggests that a 24 hour cycle may be present; the 
peak at 2 7 6 suggests an eight hour cycle. The peaks around 1 may 
be attributable to the dependence of the data. A least squares 
cyclic fit for the 24 and eight hour cycles was next carried out. 
The residuals from the fit were then whitened, using an AR2 proc- 
ess. Figure (5) shows the log (normalized periodogram) of the resi 
duals following the cyclic fit and AR2 whitening. There are still 
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